skip to main content


Search for: All records

Creators/Authors contains: "Xue, Nan"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. The main challenge of monocular 3D object detection is the accurate localization of 3D center. Motivated by a new and strong observation that this challenge can be remedied by a 3D-space local-grid search scheme in an ideal case, we propose a stage-wise approach, which combines the information flow from 2D-to-3D (3D bounding box proposal generation with a single 2D image) and 3D-to-2D (proposal verification by denoising with 3D-to-2D contexts) in a topdown manner. Specifically, we first obtain initial proposals from off-the-shelf backbone monocular 3D detectors. Then, we generate a 3D anchor space by local-grid sampling from the initial proposals. Finally, we perform 3D bounding box denoising at the 3D-to-2D proposal verification stage. To effectively learn discriminative features for denoising highly overlapped proposals, this paper presents a method of using the Perceiver I/O model [20] to fuse the 3D-to-2D geometric information and the 2D appearance information. With the encoded latent representation of a proposal, the verification head is implemented with a self-attention module. Our method, named as MonoXiver, is generic and can be easily adapted to any backbone monocular 3D detectors. Experimental results on the well-established KITTI dataset and the challenging large-scale Waymo dataset show that MonoXiver consistently achieves improvement with limited computation overhead. 
    more » « less
    Free, publicly-accessible full text available October 1, 2024
  2. This paper presents a neural incremental Structure-from-Motion (SfM) approach, Level-S2fM, which estimates the camera poses and scene geometry from a set of uncalibrated images by learning coordinate MLPs for the implicit surfaces and the radiance fields from the established key-point correspondences. Our novel formulation poses some new challenges due to inevitable two-view and few-view configurations in the incremental SfM pipeline, which complicates the optimization of coordinate MLPs for volumetric neural rendering with unknown camera poses. Nevertheless, we demonstrate that the strong inductive basis conveying in the 2D correspondences is promising to tackle those challenges by exploiting the relationship between the ray sampling schemes. Based on this, we revisit the pipeline of incremental SfM and renew the key components, including two-view geometry initialization, the camera poses registration, the 3D points triangulation, and Bundle Adjustment, with a fresh perspective based on neural implicit surfaces. By unifying the scene geometry in small MLP networks through coordinate MLPs, our Level-S2fM treats the zero-level set of the implicit surface as an informative top-down regularization to manage the reconstructed 3D points, reject the outliers in correspondences via querying SDF, and refine the estimated geometries by NBA (Neural BA). Not only does our Level-S2fM lead to promising results on camera pose estimation and scene geometry reconstruction, but it also shows a promising way for neural implicit rendering without knowing camera extrinsic beforehand. 
    more » « less
    Free, publicly-accessible full text available June 1, 2024
  3. This paper studies the challenging two-view 3D reconstruction problem in a rigorous sparse-view configuration, which is suffering from insufficient correspondences in the input image pairs for camera pose estimation. We present a novel Neural One-PlanE RANSAC framework (termed NOPE-SAC in short) that exerts excellent capability of neural networks to learn one-plane pose hypotheses from 3D plane correspondences. Building on the top of a Siamese network for plane detection, our NOPE-SAC first generates putative plane correspondences with a coarse initial pose. It then feeds the learned 3D plane correspondences into shared MLPs to estimate the one-plane camera pose hypotheses, which are subsequently reweighed in a RANSAC manner to obtain the final camera pose. Because the neural one-plane pose minimizes the number of plane correspondences for adaptive pose hypotheses generation, it enables stable pose voting and reliable pose refinement with a few of plane correspondences for the sparse-view inputs. In the experiments, we demonstrate that our NOPE-SAC significantly improves the camera pose estimation for the two-view inputs with severe viewpoint changes, setting several new state-of-the-art performances on two challenging benchmarks, i.e., MatterPort3D and ScanNet, for sparse-view 3D reconstruction. The source code is released at https://github.com/IceTTTb/NopeSAC for reproducible research. 
    more » « less
  4. This article presents Holistically-Attracted Wireframe Parsing (HAWP), a method for geometric analysis of 2D images containing wireframes formed by line segments and junctions. HAWP utilizes a parsimonious Holistic Attraction (HAT) field representation that encodes line segments using a closed-form 4D geometric vector field. The proposed HAWP consists of three sequential components empowered by end-to-end and HAT-driven designs: (1) generating a dense set of line segments from HAT fields and endpoint proposals from heatmaps, (2) binding the dense line segments to sparse endpoint proposals to produce initial wireframes, and (3) filtering false positive proposals through a novel endpoint-decoupled line-of-interest aligning (EPD LOIAlign) module that captures the co-occurrence between endpoint proposals and HAT fields for better verification. Thanks to our novel designs, HAWPv2 shows strong performance in fully supervised learning, while HAWPv3 excels in self-supervised learning, achieving superior repeatability scores and efficient training (24 GPU hours on a single GPU). Furthermore, HAWPv3 exhibits a promising potential for wireframe parsing in out-of-distribution images without providing ground truth labels of wireframes. 
    more » « less
  5. This paper studies the problem of multi-person pose estimation in a bottom-up fashion. With a new and strong observation that the localization issue of the center-offset formulation can be remedied in a local-window search scheme in an ideal situation, we propose a multi-person pose estimation approach, dubbed as LOGO-CAP, by learning the LOcal-GlObal Contextual Adaptation for human Pose. Specifically, our approach learns the keypoint attraction maps (KAMs) from the local keypoints expansion maps (KEMs) in small local windows in the first step, which are subsequently treated as dynamic convolutional kernels on the keypoints-focused global heatmaps for contextual adaptation, achieving accurate multi-person pose estimation. Our method is end-to-end trainable with near real-time inference speed in a single forward pass, obtaining state-of-the-art performance on the COCO keypoint benchmark for bottom-up human pose estimation. With the COCO trained model, our method also outperforms prior arts by a large margin on the challenging OCHuman dataset. 
    more » « less
  6. We report an experimental study of the shear-induced migration of flexible fibers in suspensions confined between two parallel plates. Non-Brownian fiber suspensions are imaged in a rheo-microscopy setup, where the top and the bottom plates counter-rotate and create a Couette flow. Initially, the fibers are near the bottom plate due to sedimentation. Under shear, the fibers move with the flow and migrate towards the center plane between the two walls. Statistical properties of the fibers, such as the mean values of the positions, orientations, and end-to-end lengths of the fibers, are used to characterize the behaviors of the fibers. A dimensionless parameter Λ eff , which compares the hydrodynamic shear stress and the fiber stiffness, is used to analyze the effective flexibility of the fibers. The observations show that the fibers that are more likely to bend exhibit faster migration. As Λ eff increases (softer fibers and stronger shear stresses), the fibers tend to align in the flow direction and the motions of the fibers transition from tumbling and rolling to bending. The bending fibers drift away from the walls to the center plane. Further increasing Λ eff leads to more coiled fiber shapes, and the bending is more frequent and with larger magnitudes, which leads to more rapid migration towards the center. Different behaviors of the fibers are quantified with Λ eff , and the structures and the dynamics of the fibers are correlated with the migration. 
    more » « less
  7. Air exchange between people has emerged in the COVID-19 pandemic as the important vector for transmission of the SARS-CoV-2 virus. We study the airflow and exchange between two unmasked individuals conversing face-to-face at short range, which can potentially transfer a high dose of a pathogen, because the dilution is small when compared to long-range airborne transmission. We conduct flow visualization experiments and direct numerical simulations of colliding respiratory jets mimicking the initial phase of a conversation. The evolution and dynamics of the jets are affected by the vertical offset between the mouths of the speakers. At low offsets the head-on collision of jets results in a `blocking effect', temporarily shielding the susceptible speaker from the pathogen carrying jet, although, the lateral spread of the jets is enhanced. Sufficiently large offsets prevent the interaction of the jets. At intermediate offsets (8-10 cm for 1 m separation), jet entrainment and the inhaled breath assist the transport of the pathogen-loaded saliva droplets towards the susceptible speaker's mouth. Air exchange is expected, in spite of the blocking effect arising from the interaction of the respiratory jets from the two speakers. 
    more » « less
  8. Abstract

    Three-dimensional dynamics of flexible fibers in shear flow are studied numerically, with a qualitative comparison to experiments. Initially, the fibers are straight, with different orientations with respect to the flow. By changing the rotation speed of a shear rheometer, we change the ratioAof bending to shear forces. We observe fibers in the flow-vorticity plane, which gives insight into the motion out of the shear plane. The numerical simulations of moderately flexible fibers show that they rotate along effective Jeffery orbits, and therefore the fiber orientation rapidly becomes very close to the flow-vorticity plane, on average close to the flow direction, and the fiber remains in an almost straight configuration for a long time. This ‘ordering’ of fibers is temporary since they alternately bend and straighten while tumbling. We observe numerically and experimentally that if the fibers are initially in the compressional region of the shear flow, they can undergo compressional buckling, with a pronounced deformation of shape along their whole length during a short time, which is in contrast to the typical local bending that originates over a long time from the fiber ends. We identify differences between local and compressional bending and discuss their competition, which depends on the initial orientation of the fiber and the bending stiffness ratioA. There are two main finding. First, the compressional buckling is limited to a certain small range of the initial orientations, excluding those from the flow-vorticity plane. Second, since fibers straighten in the flow-vorticity plane while tumbling, the compressional buckling is transient—it does not appear for times longer than 1/4 of the Jeffery period. For larger times, bending of fibers is always driven by their ends.

     
    more » « less